keywords:"Web scraping" - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: keywords:"Web scraping"

Search:

Search Tips :: Advanced Search

Search collections:

Sort by:	Display results:	Output format:

	Automatic Webpage Reconstruction Serečun, Viliam ; Ryšavý, Ondřej (referee) ; Veselý, Vladimír (advisor) Many legal institutions require a burden of proof regarding web content. This thesis deals with a problem connected to web reconstruction and archiving. The primary goal is to provide an open source solution, which will satisfy legal institutions with their requirements. This work presents two main products. The first is a framework, which is a fundamental building block for developing web scraping and web archiving applications. The second product is a web application prototype. This prototype shows the framework utilization. The application output is MAFF archive file which comprises a reconstructed web page, web page screenshot, and meta information table. This table shows information about collected data, server information such as IP addresses and ports of a device where is the original web page located, and time stamp. Detailed record
	System for Web Data Source Integration Kolečkář, David ; Bartík, Vladimír (referee) ; Burget, Radek (advisor) The thesis aims at designing and implementing a web application that will be used for the integration of web data sources. For data integration, a method using domain model of the target information system was applied. The work describes individual methods used for extracting information from web pages. The text describes the process of designing the architecture of the system including a description of the chosen technologies and tools. The main part of the work is implementation and testing the final web application that is written in Java and Angular framework. The outcome of the work is a web application that will allow its users to define web data sources and save data in the target database. Detailed record
	Sentiment Analysis of Czech and Slovak Social Networks and Web Discussions Sojka, Matěj ; Dočekal, Martin (referee) ; Smrž, Pavel (advisor) Thanks to digitalization, the spread of opinions in the population has accelerated sharply in the recent years, however the need to understand them has not changed. The goal of this thesis was to create a system for automatic data collection from social media and web discussions and sentiment analysis in Czech and Slovak language. The system has a web interface for visualizing results and configuring data analysis. The system is capable of offering topics to the user that it considers to occur in the selected data and group posts based on user-defined opinions. Detailed record
	Relationship between Changes in Betting Odds and Results of Football Matches Jurkovič, Juraj ; Bartík, Vladimír (referee) ; Zendulka, Jaroslav (advisor) The goal of this thesis is to demonstrate techniques for solving web scraping and knowledge discovery tasks. The case study is focused on the extraction of data from bookmaker websites and subsequent analysis of collected data. The thesis demonstrates the implementation of web scraping task in Python language. The thesis describes selected implementation details for developing such a system and proposes a database schema that can be used for this purpose. Collected data is analyzed using statistical methods and frequent patterns are discovered in odds movements using apriori algorithm. Discovered relationships and frequent patterns are presented to the end user. Detailed record
	Sentiment Analysis of Czech Social Networks and Web Discussions on Retail Chains Bolješik, Michal ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor) The goal of this thesis is to design and implement a system that analyses data from the web mentioning Czech grocery chain stores. Implemented system is able to download such data automatically, perform sentiment analysis of the data, extract locations and chain stores' names from the data and index the data. The system also includes a user interface showing results of the analyses. The first part of the thesis surveys the state of the art in collecting data from web, sentiment analysis and indexing documents. A description of the discussed system's design and its implementation follows. The last part of the thesis evaluates implemented system Detailed record
	Methods of Data Extraction from the Web Perina, Lukáš ; Křivka, Zbyněk (referee) ; Burget, Radek (advisor) The purpose of this bachelor thesis is to design an architecture and subsequent implementation of an application designed for data extraction (web scraping) from web documents. Unlike conventional methods, it is an extraction based on defining data types and regular expressions of requested elements. Extraction is executed in such a manner, where it is not necessary to know the detailed structure of given web document and the possibility of using just one definition to detect requested elements on different web pages. Algorithm is able to achieve overall accuracy of 85,51% and recall 80,28%. This approach can reduce the time required for analysis of web pages significantly and not to take the structure of the code as a determining factor while creating web scraping requests. Detailed record
	Identification of Cryptocurrency Users Zrnčík, Henrich ; Matoušek, Petr (referee) ; Veselý, Vladimír (advisor) abstract.en={The 3rd January 2009 is considered to be the beginning of the cryptocurrency era. This work will deal with one of the ways of obtaining information about cryptocurrency users. It will be done by retrieving it from public websites where users often knowingly or unknowingly publish their cryptocurrency's address for different purposes (donation accounts or online payments which include personal data) as a result of which it is possible to link their crypto account with their real identity. The information is obtained through web scrappings. The result is an implemented application capable of automated collecting of data about the identity of cryptocurrency users and its subsequent storage in a structured database. Detailed record
	Platform for Cryptocurrency Address Collection Bambuch, Vladislav ; Pluskal, Jan (referee) ; Veselý, Vladimír (advisor) Cílem této práce je vytvořit platformu pro sběr a zobrazování metadat o kryptoměnových adresách z veřejného i temného webu. K dosažení tohoto cíle jsem použil technologie zpracování webu napsané v PHP. Komplikace doprovázející automatické zpracování webových stránek byly vyřešeny techonologí Apache Kafka a jejími schopnosti škálování procesů. Modularita platformy byla dosažena pomocí architektury microservices a Docker containerization. Práce umožňuje jedinečný způsob, jak hledat potenciální kriminální aktivity, které se odehrály mimo rámec blockchain, pomocí webové aplikace pro správu platformy a vyhledávání v extrahovaných datech. Vytvořená platforma zjednodušuje přidávání nových, na sobě nezávislých modulů, kde Apache Kafka zprostředkovává komunikaci mezi nimi. Výsledek této práce může být použit pro detekci a prevenci kybernetické kriminality. Uživatelé tohoto systému mohou být orgány činné v trestním řízení nebo ostatní činitelé a uživatelé, zajímající se o reputaci a kreditibilitu kryptoměnových adres. Detailed record
	Analysis of User Settings on Online Social Networks Mlýnek, Martin ; Malinka, Kamil (referee) ; Januš, Filip (advisor) The bachelor thesis deals with the development of a web user interface. The goal was to design and implement an interface for the server part of the Privchecker security tool, which deals with the security of users on social networks. The problem was solved by creating a client web application based on the React JavaScript library. It also addresses the issue of transfering user's credentials, testing of the implemented interface and analysis of user settings. Detailed record
	Sentiment Analysis from Movie Reviews Bílý, Daniel ; Jon, Josef (referee) ; Smrž, Pavel (advisor) This thesis is focused on creating a system which is capable of downloading movie reviews from the web and analysingthem. There is several sources of movie reviews, Czech and English (čsfd, fdb, imdb and rotten tomatoes). The sentiment analysis is performed using machine learning. Results of the analysis are shown in a browser. Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English